A Registry of Standard Data Categories for Linguistic Annotation

نویسندگان

  • Nancy Ide
  • Laurent Romary
چکیده

In this paper we describe the most recent work within ISO TC37/SC 4, and in particular the development of a Data Category Registry (DCR) component of the Linguistic Annotation Framework. The DCR will contain a formally defined set of linguistic categories in common use within the language engineering community for reference and use in linguistically annotated resources. We outline the first proposals for creation and management of the DCR, as a solicitation for input from the community.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Cross-linguistic annotation of modality: a data-driven hierarchical model

We present an annotation model of modality which is (i) cross-linguistic, relying on a wide, strongly typologically motivated approach, and (ii) hierarchical and layered, accounting for both factuality and speaker’s attitude, while modelling these two aspects through separate annotation schemes. Modality is defined through cross-linguistic categories, but the classification of actual linguistic...

متن کامل

Automatic Annotation and Information Retrieval

Textual information extraction, and particularly the extraction of information from web-based text, requires the annotation of a great number of documents very quickly, using standard categories: syntactical, grammatical (identifying tenses and aspects), lexical (the identification of " transfer " verbs, " donation " verbs, " localization " verbs …) and communicative categories (identifying rel...

متن کامل

Methodological Aspects of Semantic Annotation

This paper constitutes a preliminary report on the work carried out on semantic content annotation in the LIRICS project, in close collaboration with the activities of ISO TC 37/SC 4/TDG 3. This consists primarily of: (1) identifying commonalities in alternative approaches to the annotation and representation of various types of semantic information; and (2) developing methodological principles...

متن کامل

ISOcat: Corralling Data Categories in the Wild

To achieve true interoperability for valuable linguistic resources different levels of variation need to be addressed. ISO Technical Committee 37, Terminology and other language and content resources, is developing a Data Category Registry. This registry will provide a reusable set of data categories. A new implementation, dubbed ISOcat, of the registry is currently under construction. This pap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004